Text Analysis with LingPipe 4

نویسندگان

  • Bob Carpenter
  • Breck Baldwin
چکیده

Externalizable.compileTo(classifier,file); @SuppressWarnings("unchecked") ConditionalClassifier compiledClassifier = (ConditionalClassifier) AbstractExternalizable.readObject(file); file.delete(); The static utility method compileTo() from Java’s class AbstractExternalizable class (in the util package) is used to do the writing. This could also be done through LingPipe’s Compilable interface directly using the traditional naive Bayes class’s method compileTo(ObjectOut). We deserialized using another utility method, readObject(), which reads and returns serialized objects from files (there are also utility methods to read compiled or serialized models from the class path as resources). Usually, one program would write the compiled file and another program, perhaps on another machine or at a different site, would read it. Here, we have put the two operations together for reference. Note that the unchecked cast warning on reading back in is suppressed; an error may still result at runtime from the cast if the file supplied to read back in has an object of a different type or isn’t a serialized object at all. Note that when read back in, it is assigned to a JointClassifier; attempting to cast to a TradNaiveBayesClassifier would fail, as the compiled version is not an instance of that class. We do the same thing for serialization, using a different utility method to serialize, but the same readObject() method to deserialize. AbstractExternalizable.serializeTo(classifier,file);Externalizable.serializeTo(classifier,file); @SuppressWarnings("unchecked") TradNaiveBayesClassifier deserializedClassifier = (TradNaiveBayesClassifier) AbstractExternalizable.readObject(file); file.delete(); 10.10. TRAINING AND TESTING WITH A CORPUS 187 Here, we are able to cast the deserialized object back to the original class, TradNaiveBayesClassifier. Because deserialization results in a traditional naive Bayes classifier, we may provide more training data. Repeating the serialization and deserialization with a different variable, we can go on to train. String s = "hardy har har"; Classified trainInstance = new Classified(s,hisCl); deserializedClassifierTrain.handle(trainInstance);

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrasal Queries with LingPipe and Lucene: Ad Hoc Genomics Text Retrieval

The hypothesis we explored for the Ad Hoc task of the Genomics track for TREC 2004 was that phrase-level queries would increase precision over a baseline of token-level terms. We implemented our approach using two open source tools: the Apache Jakarta Lucene TF/IDF search engine (version 1.3) and the Alias-i LingPipe tokenizer and namedentity annotator (version 1.0.6). Contrary to our intuition...

متن کامل

Character Language Models for Chinese Word Segmentation and Named Entity Recognition

We describe the application of the LingPipe toolkit (Alias-i 2006) to Chinese word segmentation and named entity recognition. We provide results for the third SIGHAN Chinese language processing bakeoff (Levow 2006). F1 measures on the best performing corpora were .972 for word segmentation and .855 for person/location/organization named-

متن کامل

Cross-species Gene Normalization at the University of Iowa

Background: With the increasing availability of full text articles through open access publishing, the scope of biomedical text mining is no longer limited to the abstracts of research literature. Cross-species gene normalization using full-text articles is an important step towards the use of full text articles in the area of biomedical text-mining research. This was one of the goals of the Bi...

متن کامل

ality Assurance of Bioinformatics Soware: A Case Study of Testing a Biomedical Text Processing Tool Using Metamorphic Testing∗

Bioinformatics so‰ware plays a very important role in making critical decisions within many areas including medicine and health care. However, most of the research is directed towards developing tools, and liŠle time and e‚ort is spent on testing the so‰ware to assure its quality. In testing, a test oracle is used to determine whether a test is passed or failed during testing, and unfortunately...

متن کامل

UAIC Participation at RTE5

Textual entailment recognition is the task of deciding, given two text fragments, whether the meaning of one text can be deduced from the other. This year, at our third participation in the RTE competition, we improved the system built for the RTE4 competition. Main Task: The main idea of our system is to map every word in the hypothesis to one or more words in the text. For that, we transform ...

متن کامل

Quality Assurance of Bioinformatics Software: A Case Study of Testing a Biomedical Text Processing Tool Using Metamorphic Testing

Bioinformatics so‰ware plays a very important role in making critical decisions within many areas including medicine and health care. However, most of the research is directed towards developing tools, and liŠle time and e‚ort is spent on testing the so‰ware to assure its quality. In testing, a test oracle is used to determine whether a test is passed or failed during testing, and unfortunately...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012